Executive Summary
COVID-19 data from United Kingdom and 27 European countries have been analysed to compare the number of confirmed cases and deaths between United Kingdom and EU countries for time period from 22nd of January, 2020 to 18th of January, 2021.
The total number of confirmed cases as well as deaths shows that the United Kingdom has the highest number with France and Italy as the next two countries, from all of 28 countries. However, in order for the comparison to be more reliable, the size of the population of individual countries should be taken into account.
The coefficient calculated for this purpose as number of confirmed cases per 100,000 people showed, that the Czechia, Luxembourg and Slovenia are at the top of the list, while the United Kingdom is in two-thirds of it. The countries with the lowest confirmed cases rate are Finland, Greece and Germany.
With a death rate calculated as the number of deaths per 100,000 inhabitants, the country with the highest ratio is Belgium, then Slovenia and Italy. The United Kingdom ranks fifth, just behind Czechia. Finland has the lowest rate again, followed by Cyprus and Estonia.
Data summary
Summary of data for the United Kingdom
The first period of significant increases for the number of new confirmed cases occurred between March and June, with a peak of 5,490 cases dated the latter part of April. The second significant increase occurred between early October and January, peaking at 68,053 cases on January 8, 2021. In the case of virus-related deaths, a period can be distinguished between mid-March and June, when the number of new deaths increased significantly until April, reaching 1,224 and then dropping down to near zero. The second period of noticeable increases began in October, with a slight decline in mid-December, reaching the highest values in mid-January 2021.
In both cases, two periods of growth can be seen, with a significant decline in value in between. The reason for the decrease in the daily number of confirmed cases and deaths is the decision of the authorities to introduce significant restrictions on the movement of residents and limit the operation of some companies, especially related to on-site customer service.
The period of relaxation of the restrictions, as well as a new strain of the virus that spreads faster, are likely to cause the second phase of growth to be much higher than the first. The noticeable decline around mid-January 2021 may be related to new restrictions introduced by the Government in the United Kingdom.
Summary of data for all European countries
Each of the twenty-eight countries experienced an increase in the total number of cases of the virus in the first quarter of 2020. In most countries, the number of cases flattened in the middle of the year, then increased significantly by the end of 2020. The same was true for the total number of deaths.
Comparing the total number of deaths and confirmed cases caused by the virus for all European countries, Great Britain has the highest values in both cases.
Bearing in mind the fairness of the comparison, the country’s population should be taken into account as an additional factor. For this purpose, ratios of deaths and confirmed cases per 100,000 inhabitants have been calculated.
The Czech Republic, Luxembourg and Slovenia have the highest ratio of COVID-19 cases. The United Kingdom is in one third of the list, while the last three places belong to Germany, Greece and Finland.
The countries with the highest death ratio caused by virus are Belgium, Slovenia and Italy. Whereas Finland, Cyprus and Estonia have the smallest level of this ratio. Great Britain is in fifth place.
A table summarizing the minimum, average and maximum values for the number of confirmed cases, deaths and their ratio has been prepared below.
Table 1: Summary table with minimum, average and maximum values for covid_eu_uk_selected_merged_latest_day dataset
|
Type
|
Confirmed_cases_cumulatively
|
Deaths_cumulatively
|
Confirmed_cases_Ratio
|
Deaths_Ratio
|
|
Minimum
|
15742.0
|
175.00
|
734.700
|
11.21000
|
|
Average
|
760652.2
|
18411.75
|
4359.622
|
86.95071
|
|
Maximum
|
3433494.0
|
89860.00
|
8405.720
|
179.60000
|
Confirmed cases ratio vs population density
It seems interesting to investigate whether the countries with the highest ratio of confirmed cases are also the most densely populated. To compare these data, the following grouped bar chart has been prepared, with the values sorted by the size of the indicator of confirmed COVID-19 cases.
The country with the highest population density is Malta. However, in terms of confirmed cases ratio, it is in the twentieth place out of twenty-eight countries. Finland has the lowest proportion of confirmed cases per 100,000 people and its population density is also the lowest.
Taking into account other countries, such as the Netherlands, Sweden, Lithuania or Slovenia, it seems clear that it is impossible to directly associate a higher population density with a higher confirmed COVID-19 cases ratio.
Table 2: The list of country’s population density and COVID-19 confirmed cases ratio, sorted from highest value to lowest
|
Country
|
Pop_density_[persons_per_km2]
|
Conf_cases_Ratio
|
|
Malta
|
1548.3
|
3309.22
|
|
Netherlands
|
504.0
|
5339.06
|
|
Belgium
|
375.3
|
5963.64
|
|
United Kingdom
|
273.8
|
5180.79
|
|
Luxembourg
|
235.1
|
8121.20
|
|
Germany
|
234.7
|
2487.41
|
|
Italy
|
202.9
|
3951.63
|
|
Denmark
|
138.0
|
3284.70
|
|
Czechia
|
137.7
|
8405.72
|
|
Poland
|
123.6
|
3788.94
|
|
Portugal
|
113.0
|
5407.65
|
|
Slovakia
|
111.8
|
4122.36
|
|
Austria
|
107.1
|
4476.62
|
|
Hungary
|
107.1
|
3606.97
|
|
France
|
105.6
|
4345.12
|
|
Slovenia
|
102.9
|
7229.45
|
|
Cyprus
|
94.4
|
3370.61
|
|
Spain
|
93.1
|
5007.56
|
|
Romania
|
83.1
|
3559.30
|
|
Greece
|
82.5
|
1386.49
|
|
Croatia
|
73.2
|
5483.58
|
|
Ireland
|
70.9
|
3619.64
|
|
Bulgaria
|
63.9
|
3012.51
|
|
Lithuania
|
44.7
|
5980.70
|
|
Estonia
|
30.4
|
2830.95
|
|
Latvia
|
30.4
|
2890.23
|
|
Sweden
|
25.0
|
5172.66
|
|
Finland
|
18.1
|
734.70
|
|
|
Country
|
Conf_cases_Ratio
|
Pop_density_[persons_per_km2]
|
|
Czechia
|
8405.72
|
137.7
|
|
Luxembourg
|
8121.20
|
235.1
|
|
Slovenia
|
7229.45
|
102.9
|
|
Lithuania
|
5980.70
|
44.7
|
|
Belgium
|
5963.64
|
375.3
|
|
Croatia
|
5483.58
|
73.2
|
|
Portugal
|
5407.65
|
113.0
|
|
Netherlands
|
5339.06
|
504.0
|
|
United Kingdom
|
5180.79
|
273.8
|
|
Sweden
|
5172.66
|
25.0
|
|
Spain
|
5007.56
|
93.1
|
|
Austria
|
4476.62
|
107.1
|
|
France
|
4345.12
|
105.6
|
|
Slovakia
|
4122.36
|
111.8
|
|
Italy
|
3951.63
|
202.9
|
|
Poland
|
3788.94
|
123.6
|
|
Ireland
|
3619.64
|
70.9
|
|
Hungary
|
3606.97
|
107.1
|
|
Romania
|
3559.30
|
83.1
|
|
Cyprus
|
3370.61
|
94.4
|
|
Malta
|
3309.22
|
1548.3
|
|
Denmark
|
3284.70
|
138.0
|
|
Bulgaria
|
3012.51
|
63.9
|
|
Latvia
|
2890.23
|
30.4
|
|
Estonia
|
2830.95
|
30.4
|
|
Germany
|
2487.41
|
234.7
|
|
Greece
|
1386.49
|
82.5
|
|
Finland
|
734.70
|
18.1
|
|
Conclusions
All countries have experienced two major periods of increase in COVID-19 cases and deaths over a similar period of time. It would be worth to investigate the reasons for their ups and downs, looking for data on country-specific restrictions and other factors, such as a new virus variant. However, it should be clearly stated that the data on them were not the subject of this report.
No evidence was found to support the idea that countries with the highest population density have the highest virus infection rate.
Data preparation
1. Additional datasets
Dataset containing country codes
The source of the dataset with country codes
This dataset was retrieved from the United Nations Statistics Division and has been used to obtain the names of the countries to add them to other two additional datasets containing data on the population and population density. For this purpose, two columns containing country names and country codes have been filtered and saved.
Datasets containing data about population and population density
The source of the dataset with population of European countries
The source of the dataset with population density of European countries
These datasets were retrived from Eurostat, the statistical office of the European Union. The first dataset has been used to obtain population of countries in Europe. Some numerical data has been marked by p, which stood for provisional. This has been cleaned and formatted. The population density dataset contained the most recent data for 2018, therefore this year is included for the population data. The first columns of both datasets contained strings. They were used to remove redundant provincial data and extract country codes from them.
The population and population density data were combined into a single dataset containing only 28 countries (UK + EU countries), to which the country names from the country code dataset were added.
As it turned out, the created dataset did not contain the names of two countries. The reason was the difference between the methodology used by European Union and UNSD in the country codes. The United Kingdom has the UNSD country code GB and the European Union country code UK. Likewise Greece: GR according to UNSD and EL for the European Union. The names of these countries have been added.
Table 3: Missing values in eu_uk_pop_and_pop_density_merged dataset
|
Country
|
Country_Code
|
Population_2018
|
Pop_density_[persons_per_km2]
|
|
NA
|
EL
|
10741165
|
82.5
|
|
NA
|
UK
|
66273576
|
273.8
|
2. Datasets containing COVID-19 confirmed cases and deaths
The source of the dataset with COVID-19 confirmed cases
The source of the dataset with COVID-19 deaths Europe
Both datasets contained global data of COVID-19 confirmed cases/deaths counted cumulatively. They have been downloaded as a RAW datasets in .csv format from Kaggle website and are described as the updating version of COVID-19 Data Repository by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University (JHU).
They have an identical structure, having columns with country/region names, province/state names, latitude and longitude of countries and data about COVID-19 stored in 363 columns, every each for a day from 22/01/2020 to 18/01/2021.
They have been formatted and filtered to obtain values for United Kingdom and 27 European countries in a long format of data.
3. Combining prepared datasets into one
The data of COVID-19 confirmed cases and deaths have been merged with population and population density data. Two new columns containing the ratio of confirmed COVID-19 cases and deaths per one hundred thousand inhabitants have been calculated and added.
Due to the fact that the COVID-19 data contained in this dataset was counted cumulatively for each day, it was necessary to filter the data for the last available day.
Table 4: Previev of covid_eu_uk_selected_merged_latest_day dataset
|
Date
|
Country
|
Lat
|
Long
|
Population_2018
|
Pop_density_[persons_per_km2]
|
covid_conf_cases_cum
|
covid_deaths_cum
|
Conf_cases_Ratio
|
Deaths_Ratio
|
|
2021-01-18
|
Austria
|
47.5162
|
14.550100
|
8822267
|
107.1
|
394939
|
7122
|
4476.62
|
80.73
|
|
2021-01-18
|
Belgium
|
50.8333
|
4.469936
|
11398589
|
375.3
|
679771
|
20472
|
5963.64
|
179.60
|
|
2021-01-18
|
Bulgaria
|
42.7339
|
25.485800
|
7050034
|
63.9
|
212383
|
8565
|
3012.51
|
121.49
|
|
2021-01-18
|
Croatia
|
45.1000
|
15.200000
|
4105493
|
73.2
|
225128
|
4655
|
5483.58
|
113.38
|
|
2021-01-18
|
Cyprus
|
35.1264
|
33.429900
|
864236
|
94.4
|
29130
|
175
|
3370.61
|
20.25
|
|
2021-01-18
|
Czechia
|
49.8175
|
15.473000
|
10610055
|
137.7
|
891852
|
14449
|
8405.72
|
136.18
|
To present the situation in the United Kingdom, data has been filtered and modified by adding two new columns with calculated number of new confirmed cases/deaths for each day.
References
Grolemund, Garrett, and Hadley Wickham. 2011. “Dates and Times Made Easy with lubridate.” Journal of Statistical Software 40 (3): 1–25. https://www.jstatsoft.org/v40/i03/.
Inc., Plotly Technologies. 2015. “Collaborative Data Science.” Montreal, QC: Plotly Technologies Inc. 2015. https://plot.ly.
R Core Team. 2020. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.
Wickham, Hadley, Mara Averick, Jennifer Bryan, Winston Chang, Lucy D’Agostino McGowan, Romain François, Garrett Grolemund, et al. 2019. “Welcome to the tidyverse.” Journal of Open Source Software 4 (43): 1686. https://doi.org/10.21105/joss.01686.